23 research outputs found
Ptolemaic Indexing
This paper discusses a new family of bounds for use in similarity search,
related to those used in metric indexing, but based on Ptolemy's inequality,
rather than the metric axioms. Ptolemy's inequality holds for the well-known
Euclidean distance, but is also shown here to hold for quadratic form metrics
in general, with Mahalanobis distance as an important special case. The
inequality is examined empirically on both synthetic and real-world data sets
and is also found to hold approximately, with a very low degree of error, for
important distances such as the angular pseudometric and several Lp norms.
Indexing experiments demonstrate a highly increased filtering power compared to
existing, triangular methods. It is also shown that combining the Ptolemaic and
triangular filtering can lead to better results than using either approach on
its own
Optimal Metric Search Is Equivalent to the Minimum Dominating Set Problem
In metric search, worst-case analysis is of little value, as the search
invariably degenerates to a linear scan for ill-behaved data. Consequently,
much effort has been expended on more nuanced descriptions of what performance
might in fact be attainable, including heuristic baselines like the AESA
family, as well as statistical proxies such as intrinsic dimensionality. This
paper gets to the heart of the matter with an exact characterization of the
best performance actually achievable for any given data set and query.
Specifically, linear-time objective-preserving reductions are established in
both directions between optimal metric search and the minimum dominating set
problem, whose greedy approximation becomes the equivalent of an oracle-based
AESA, repeatedly selecting the pivot that eliminates the most of the remaining
points. As an illustration, the AESA heuristic is adapted to downplay the role
of previously eliminated points, yielding some modest performance improvements
over the original, as well as its younger relative iAESA2